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Abstract 

Standard cell placement is a NP complete open problem. The main objectives of a placement algorithm are to 
minimize chip area and the total wire length of all the nets. Due to interconnect dominance, Deep Sub Micron 
VLSI design flow does not converge leading to iterations between synthesis and layout steps. We present a new 
heuristic placement algorithm called Sankeerna, which tightly couples synthesis and routing and produces 
compact routable designs with minimum area and delay. We tested Sankeerna on several benchmarks using 
0.13 micron, 8 metal layer, standard cell technology library. There is an average improvement of 46.2% in 
delay, 8.8% in area and 114.4% in wire length when compared to existing placement algorithms. In this paper, 
we described the design and implementation of Sankeerna algorithm and its performance is illustrated through 
a worked out example. 

KEYWORDS : Placement, VLSI Design flow, Synthesis, Routing, Area and delay minimization 

I. Introduction 

VLSI chip complexity has been increasing as per the Moore's law, demanding more functionality, 
high performance, but with less design time. Producing compact layouts with high performance in 
shorter time is required, in order to meet the time to market needs of today's VLSI chips. This calls 
for tools, which run faster and also which converge without leading to design iterations. Placement is 
the major step in VLSI Design flow which decides the area and performance of the circuit. Detailed 
Routing is another time consuming step, which is performed after placement. If placement is not wire 
planned, routing may lead to heavy congestion resulting in several Design Rule Check (DRC) 
violations. It is required to iterate again with a new placement. If the wiring is not planned properly 
during placement, circuits routed may not meet the timing goals of the design. So there is a need for 
placers which are faster, produce compact layouts, meet the timing requirements and make the 
routing converge without DRC violations. The back end process of VLSI Design flow, that is, 
preparation of layout, is also to be tightly coupled with the front end synthesis process to avoid 
design iterations between synthesis and layout steps. It has been found that even after several 
iterations, this two step process does not converge and using wire load models this timing closure 
problem [1,2] will not be solved. 

In general, the standard cell placement problem can be stated as: Given a circuit consisting of 
technology mapped cells with fixed height and variable width, and a netlist connecting these cells, 
and Primary Inputs and Primary Outputs, construct a layout fixing the position of each cell without 
overlap with each other. The placement when routed should have minimum area, wire length, delay 
and should be routable. Minimum area is the area close to the sum of mapped standard cell areas. 
Minimum wire length is the sum of all nets in the circuit when placed and routed. Delay is the delay 
of worst path in the routed circuit. Routability indicates that the layout should not be congested; wires 
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routed should respect the Design Rules of the particular technology such that the routing is 
completed. Standard cell placement is known to be a NP complete open problem [3]. 
A Synergistic approach towards Deep Sub Micron (DSM) design, coupling logic synthesis and 
physical design is the need of the day [4, 1,5]. There have been efforts to integrate synthesis and 
layout steps [6, 7, 8, 9]. All these efforts try to estimate wire delays, with the hope that they will be 
met finally, which is not happening. Wire delays are inevitable. The problem is not with wire delays, 
but with the non convergence and unpredictability. What we need is a quick way of knowing the final 
delay and a converging design flow. We have developed a design flow and a placer called Sankeerna 
targeted to produce compact routable layouts without using wire load models. 

In Section 2, we briefly review the existing methods of placement and their limitations with respect to 
achieving a tightly coupled convergent design flow. Section 3 gives the basis for the Sankeerna 
algorithms. With this background, a new placer called Sankeerna was developed which is described 
in Section 4. The new placer Sankeerna is illustrated with an example in Section 5. The experimental 
setup to evaluate Sankeerna is described in Section 6. Results are tabulated and improvements 
obtained are discussed in Section 7. Conclusions of research work carried and future scope are given 
in Section 8. 

II. Related work 

Classical approaches to placement are reviewed in [10, 11, 3, 12, 13] and recent methods in [14, 15, 
16]. The placement methods are classified based on the way the placement is constructed. Placements 
methods are either Constructive or Iterative [13]. In constructive method, once the components are 
placed, they will never be modified thereafter. An iterative method repeatedly modifies a feasible 
placement by changing the positions of one or more core cells and evaluates the result. Because of the 
complexity, the circuits are partitioned before placement. The constructive methods are (a) 
Partitioning-based which divide the circuit into two or more sub circuits [17] (b) Quadratic 
assignment which formulates the placement problem as a quadratic assignment problem [18, 19, 20] 
and (c) Cluster growth which places cells sequentially one by one in a partially completed layout 
using a criteria like number of cells connected to a already placed cell [21]. Main iterative methods 
are (a) Simulated annealing [22, 23,35], (b) Simulated evolution [15, 24] and (c) Force-directed [25, 
20]. 

Another classification based on the placement technique used, was given in [20]. The placers were 
classified into three main categories namely (a) Stochastic placers which use simulated annealing 
which find global optimum with high CPU time, (b) Min-cut placers which recursively divide the 
netlist and chip area and (c). Analytical placers which define an analytical cost function and minimise 
it using numerical optimization methods. Some placers may use a combination of these techniques. 
These methods use only component cell dimensions and interconnection information and are not 
directly coupled to the synthesis. 

The methods which use structural properties of the circuit are (a) Hierarchical placement [27] (b) Re- 
synthesis [9] and (c) Re-timing. There are algorithms which use signal flow and logic dependency 
during placement [28, 29]. In [28], critical paths are straightened after finding the zigzags. When 
placement is coupled with synthesis, this extra burden of finding criss-crosses is not required. In [30], 
using the structure of the interconnection graph, Placement is performed in a spiral topology around 
the centre of the cell array driven by a Depth First Search (DFS) on the interconnection graph. The 
algorithm has linear time complexity to the number of cells in the circuit. 

To obtain delay optimized placements, timing driven placement methods are used [31, 32, 33, 34, 36, 
37, 38, 39, 40]. The idea is to reduce the wire length on certain paths instead of total wire length. 
These methods are either path based or net based. The longest path delays are minimized in path 
based methods. Finding longest paths exponentially grows with the complexity of the design. Timing 
constraints are transformed into net-length constraints in the net based algorithms. Then a weighted 
wire length minimized placement is done iteratively until better timing is achieved. The drawbacks of 
this method are (a) delay budgeting is done without physical placement feasibility and (b) it is 
iterative. At the end of the iteration, the solution produced is evaluated. 
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To control congestion and to achieve routability, white spaces are allocated at the time of placement 
[41, 26, 42, 43]. The problem with this approach is, it increases area which in turn will increase wire 
length and delay. In [44], it was shown that minimising wire length improves routability and layout 
quality. Allocating white space may not be the right approach to achieve routability. It is better to 
minimise the wire length instead of allocating white spaces. The white space allocated may not be the 
right place required for the router. 

The studies in [45] have shown that existing placement algorithms produce significantly inferior 
results when compared with the estimated optimal solutions. The studies in [12] show that the results 
of leading placement tools from both industry and academia may be up to 50% to 150% away from 
optimal in total wire length. 

The design flow convergence is another main requirement of Synergistic approach towards DSM 
design [4]. Placement plays a major role in this. As mentioned in [4], there are three types of design 
flows for Deep Sub Micron (DSM) namely (a) Logic synthesis drives DSM design (b) Physical 
design drives DSM design (c) Synergistic approach towards DSM design. In the last method (c), it is 
required to create iteration loops which tightly couple various levels of design flow. The 
unpredictability of area, delay, and routability of the circuits from synthesis to layout, is another 
major problem. The study in [46, 47] indicated that the non-convergence of design process is due to 
non-coupling of synthesis [48, 49, 50, 51] to placement process. We need a faster way of estimating 
area and delay from pre-place to post-route. If we fail to achieve this, we may not have a clue to 
converge. 

From the above analysis of the state of the art placement algorithms, we feel that there is still scope 
for improvement and the need for better placement algorithms meeting the requirements as mentioned 
in section 3. In the next Section, we describe the basis for our new algorithms, which try to solve 
some of the problems mentioned above. 

III. Basis for the New Algorithms 

The new placement algorithm should have the following features (a) linear or polynomial time 
complexity with respect to number of cells in the circuit, (b) awareness of synthesis and routing 
assumptions and expectations, that is, tight coupling of synthesis and routing as mentioned in [4, 46, 
47], (c) achieving minimum area and delay, (d) produce routable layouts without Design Rule Check 
(DRC) violations, by proper wire planning during placement, (e) delay of final layout should be 
predictable with trial routes and (f) should smoothly interface with synthesis and routing tools. 
In this section, we explain the basis for the Sankeerna algorithms. Since the circuits are to be placed 
to achieve minimum area and delay, we first try to find out what is the minimum area and delay 
which are achievable for a given circuit and technology library. The minimum area achievable is the 
sum of widths of all cells multiplied by the height of the standard cells. For a given standard cell 
technology library, the height is same for all cells. 

To find out the minimum delay achievable, we use Figure 1(a) to explain the delay calculation 
process. 
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(a) Logic and Timing dependency 



(b) Possible locationsto place a cell 



Figure 1 (a) Logic & timing dependency (b) possible locations to place a cell 

There are 4 gates marked as gl, g2, g3 and G. The gates gl, g2 and g3 are at level i and G is at level 
i+1 as per the logic of the circuit. Let d is delay of gate G, a is maximum of arrival times of (gl, g2, 
g3) at the input of G, b is block delay of G,/is fan-out delay of G and w is wire delay of G. Then d is 
equal to sum of a, b, /and w. The d is dependent on the arrival times of inputs gl, g2 and g3 which 
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are in transitive fan-in of G. The gates gl, g2 and g3 in turn will have inputs either from Primary 
Inputs (Pis) or from other gates. So delay of G is dependent on the transitive fan in of all gates 
connected to gl, g2 and g3. To minimise d, the only thing that can be done at placement time is to 
minimize the wire delay w. Wire delay w depends on the wire lengths si, s2 and s3, and several other 
factors. We will consider only wire length here. To minimize wire lengths si, s2 and s3 which are 
outputs of gates gl, g2 and g3, they are to be placed at physically closest location to G. The possible 
places for a cell H which are nearer to a cell G are shown in Figure 1(b). H can be placed in any of 
the eight positions indicated by NW, N, NE, W, E, SW, S and SE. The distance from H output pin to 
G input pin for all these 8 possible locations depends on the width and height of H, G, and pin 
locations on these two cells. The lines 1, 2, 3, 4, 5, 6, 7 and 8 in Figure 1 show the Euclidean distance 
for all 8 positions. The same procedure can be adopted to calculate the Manhattan distance. The 
Physically Shortest Place (PSP) is the location which has minimum Manhattan distance. In real 
technology libraries, the cell positions are specified by a rectangle or a set of rectangles, not just as a 
point. So, we can choose to connect anywhere on the rectangle based on the closeness to the 
destination cell. Out of available locations, the one with the minimum Manhattan distance is on the 
Physically Shortest Path (PSP). 

Let r be the required time of a gate, and a be the arrival time, then the slack s at gate G is r minus a. 
Out of all slacks of inputs to a gate G, the Worst Negative Slack (WNS) indicates that the cell is on 
the critical path. The inputs gl, g2 and g3 which are more critical are to be placed closer to gate G 
when compared to others. This argument has to be recursively applied to all gates which are in 
transitive fan in of gl, g2 and g3. That is, placing gl nearer to G means, placing all the cells which 
are in transitive fan in of gl nearer to G. All gates which are in transitive fan in of gl are to be placed 
on PSP giving priority to cells which have higher WNS. Let WNS of gl, g2 and g3 be -2, -1 and -3 
respectively. Placement priority is g3 first, then gl and last g2. The minimum delay is achieved when 
gl, g2 and g3 are placed in PSP from Primary Outputs (POs) to Primary Inputs (Pis). 
Sankeerna uses constructive method of placement. Starting from the Primary Output (PO), cells are 
placed on PSP as explained above. The height to width ratio of the smallest drive capability inverter 
is 4 for the standard cell library we have used for conducting experiments in this paper. So the row of 
a standard cell becomes the Physically Shortest Path (PSP). WNS at each node input, decides Delay- 
wise Shortest Path (DSP). Sankeerna combines PSP and DSP concepts explained above to produce 
routable placements minimizing area, delay and wire length in linear time. These concepts are further 
illustrated with a worked out example in Section 5. The next Section explains the algorithms used in 
Sankeerna. 

IV. Algorithms used in Sankeerna 

We have modified the algorithms of ANUPLACE [46] to produce delay optimized placements using 
a constructive method. Sankeerna reads the benchmark circuit which is in the form of a netlist, taken 
from "SIS" synthesizer [52], builds trees with Primary Outputs (POs) as roots. In SIS, we specify zero 
as the required time at the POs and print the worst slack at each node. This slack information is read 
along with the netlist into Sankeerna. The inputs are sorted based on this slack, with descending order 
of time criticality at each node. Starting from the root to the leaf node, nodes are placed on the layout 
after finding the closest free location. At each node, most time critical node is placed first using a 
modified Depth First Search (DFS) method. Priority is given to time when compared to depth of the 
tree. It was proved that placement of trees can be done in polynomial time [7]. A Depth First Search 
(DFS) algorithm was used in [30] which has linear time complexity to the number of connections. 
Tree based placement algorithms reported in literature have either linear time or O (n log n) time 
complexity [7, 53, 54, 55]. 

We have used benchmark circuits from SIS [52] synthesizer in "BLIF" format which are then 
converted into Bookshelf [56] format using converters provided in [59, 60, 41, 61]. The normal 
placement benchmark circuits [57, 45] are not useful because they give only cell dimensions and 
interconnect information. Timing, cell mapping, logic dependency and other circuit information from 
synthesizer are not available in these placement benchmarks. These converters do not use technology 
library information for cell dimensions and pin locations. Sankeerna reads the technology 
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information consisting (a) cell names, (b) cell dimensions height and width, (c) pin locations on the 
cells, (d) timing information, and (e) input load from a file. Using this technology information, 
Sankeerna generates the benchmark circuit in bookshelf format with actual cell dimensions and pin 
locations. Once the trees are created the slack information is read into the tree data structure. 
Sankeerna algorithm is shown in Figure 2. 



Main 

•Read technology library 

•Read the benchmark circuit. 

•Build trees with primary outputs as roots. 

•Read cell mapping and delay file 

•Print verilog 

•Sort inputs of each node of the tree based on time criticality. 

•Sort Trees of Primary Outputs (PO) based on time criticality. 

•Put level information in each node of the tree. 

•Print the benchmark with technology cell dimensions and pin locations. 

•Run Public Domain Placer (PDP) 

•Read PDP placement 

•Calculate the layout width and height using standard cell area & aspect ratio. 

•Number of rows = layout height /standard cell height. 

•Initialize row tables to keep track of the placement as it is constructed. 

•Place circuit . 

•Place Primary Inputs (Pis) and Primary Outputs (POs) 

•Print ".def " files of PDP and Sankeerna.. 



void place_ckt () 

{ next_PO = pointer to list of trees 

pointed by Primary Outputs(POs); 
no=number of PO cells; 
for ( i=0; i<no; i++ ) 
{ place_cell ( next_PO ); 
next PO=next PO->next; 
) 
} 


void f ind_best_place ( gate) 

{ checkavailability on the same row, and 

above and below current row; 
Out of available rows, find the row 

which gives minimum wire length; 
return coordinates of minimum location; 
} 




void checkavailability_on_row (row, width) 

{ x 1 = ro w_table_xl [row] ; x2=row_table_x2[row] ; 

if (( fabs ( x2-xl ) +width ) <= layout_width ) 
re turn (possible, x2) 

else return(not_possible); 
} 




void place_cell ( gate ) 
{ next_pin = pointer to 

list of input pins; 

place_one_cell ( gate); 

for ( i=0; i< ( gate->no of inputs ); i++ ) 

{ place_cell ( next_pin ); 
next pin=next pin->next; 

} 
} 




void place_one_cell ( gate ) 
{ find_best_place (gate); 
place_cell_on_layout_surface ( gate); 
} 



Figure 2 Sankeerna algorithms 

As shown in the Figure 2, the "place_ckt" function places the trees pointed by Primary Outputs (POs) 
one after the other using "place_cell" function starting at row 1 at x=0 and y=0. The "place_cell" 
function works as follows. 

• Place the cell pointed by root using "place_one_cell". 

• For each input, if it is a primary input, place it using "place_one_cell", if not; call 
"place_cell" with this input recursively. 

The "place_one_cell" function finds the best place to place the cell using "find_best_place" function 
and places the cell at this location. The "find_best_place" function works as follows. As the 
placement progresses, the "used space" and "available space" are marked, C is the current cell and 
the next cell will be placed closer to this cell wherever space is available. The current cell placement 
is depicted in Figure 3. 

The "find_best_place" function checks the availability of space using "check_availability_on_row" 
function on the same row of C and on the rows above and below the current row of C. The possible 
places for a cell H which are nearer to a cell G are shown in Figure 3. Out of available locations, the 
one with the minimum distance from the parent cell is chosen. The cell is placed at this location. The 
"check_availability_on_row" function keeps two pointers xl and x2 for each row. Initially xl and x2 
are initialised to zero before the start of placement. When this function is invoked, it gets these two 
pointers xl and x2 from the table corresponding to this row. It then calculates whether in the 
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available space, this cell of "width" can be placed such that the row width will be less than or equal 
to the layout width. If space is available, it returns x2 as the available location on this row. 
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Figure 3 Find_best_place possible locations and the layout seen by Find_best_place 

Complexity of the algorithm 

If n is the number of the cells in the circuit, "place_one_cell" function is invoked n times. 
"Place_one_cell" calls "find_best_Place" which uses "check_availability_on_row". So 
"find_best_Place" is executed n times. Each time, it calculates wire lengths for possible locations to 
choose the best one. "check_availability_on_row" performs one comparison operation. So number of 
operations is linearly proportional to n, after construction of the tree. So the complexity of algorithm 
is of the order of n. 

V. Illustration of Sankeerna with an example 

The algorithms used in Sankeerna are illustrated with an example whose logic equation is given 
below, taken from [49]. 

.V = oieg -+- ait_fg -+- ajhieej — I— ocefl -+- at:^3 — I— ocgcj 
-t-tiezjj -t~ riftf -4— rlt-zff -+- fs/r -+- ft i. -+- t.-ft -t— cri. 

The logic diagram with technology mapped cells and the tree built by Sankeerna for the above logic 
equation with the slacks are shown in Figure 4 and 5 along with the sequence of placement based on 
the time criticality after synthesis using SIS [52]. 




UH444 
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K330634 
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Figure 4 Logic Diagram with technology mapped cells for the example equation 

The sequence of placement is indicated by the numbers 1-26 shown at the each node of the tree. 
There are 9 primary inputs marked as a, b, c, d, e, f, g, h, i and there is one primary output marked as 
Y. Sankeerna places the Primary Output cell Y first at x=0 in row 1. Then it looks at its leaf cells 
Ua356 and Ua358. From the time criticality given in Figure 5, it places cell Ua356 after finding the 
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best place. The algorithm is then recursively invoked to place the tree with root as Ua356 which 
places the cells and the inputs in the sequence 3 to 21. 













Y| Primat 


■y Output 












1 / 


[UY] >v 
0.347891 J 














2 / 


[Ua3S6] A^ 
0.347891 J 


/[Ua358] A 
V. 0.330634 J 


22 








1% 




Z^ P T~%' 


3^~ S^ 


25 








' [u 


a349]^<\ / 


[Ua347] \. 


/[Ua434l ^> 


?[Ua444] ^> 








, 0.169878 J J, 


0.347891 J 


V 0.330634 J 


V 0.154191 J 






XT' ^~* 




^-~-<p3 


^Tt^' 


J^Tjzi 


^~~X^J6__^ 


/[Ua442I >> 


/[Ua432] > 


/[Ua446] 




[Ua445] }/ 


IUa360] >, 


/[Ua433] 


\ /[Ua443] >j 


V 0.160378 J 


\ (1.160878 j 


V 0.167536 

16 

"'[Ua430] 
k 0.229631 




0.167S36 J \ 
■i F _^^_ — *■ 

[Ua431] N, / 
0.229631/ \ 


0.347891 J 

[Ua351] >. 
0.347891 ) 


V 0.330634 


/ V 0.154191 J 








[Ua364] A y 
0.339816^ ( 


[Ua362] \^ 
0.347891 J 










^I3<r; 




UJ4 


[Ua438] >. 


/[Ua441I > 




f [Ua439] 




"[Ua438]^\ / ' 






\. 0.339816 




0.323096 ){ 

[Ua435] >l/ 
0.323096 ){ 


0.347891 J 

[Ua437] >. 
0.347891 J 


V 0,339262 J 

f [Uad40] 

V 0.339262 


10 



Figure 5 Tree built by Sankeerna for the example 
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Figure 6 Sankeerna and Publice Domain Placer (PDP) Placements of example 
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Once the placer completes the placement of tree pointed by Ua356 as root, it starts placing the tree 
pointed by cell Ua358. Now the cells marked 22 to 26 placed. This completes the placement of 
complete circuit. Primary Inputs and Primary Outputs are re-adjusted after placing all the cells. The 
final placement is shown in Figure 6(a). The cell name, sequence numbers and pin locations of the 
cell are shown in the Figure 6(a). These diagrams are taken from Cadence® SOC Encounter® back 
end tool [58]. The placement given by the Public Domain Placer (PDP) [59, 60, 41, 61] for this 
example is also shown in Figure 6(b). The layouts of these placements after carrying out detailed 
routing with Cadence® SOC Encounter® [58] are shown in Figure 7. The results are shown in table 
1, Serial Number 24, as "example". 




Public Domain Placer (PDP) 



Figure 7 Sankeerna & PDP layout of example 

The experimental set up to evaluate Sankeerna using benchmark circuits is explained in the next 
section. 

VI. Test setup 

In this section, we describe the test set up used to evaluate Sankeerna. The test set up is shown in 
Figure 8. 

The benchmark circuits are taken in the form of a PLA. Some of them are from MCNC benchmarks. 
We also added few other circuits like multiplexers, adders and multipliers. We have used 0.13 
micron, 8 metal layers, standard cell technology library for these experiments. Public domain SIS 
synthesizer [52] is used for synthesizing the benchmark circuits. 

We created the library file required for SIS in genlib format using the information from the data 
sheets and ".lef ' files of this particular technology. Three files namely (a) delay file (b) Cell 
mapping file (c) BLIF file are generated from SIS. The delay file consists of node name and slack at 
each node. Only block delays and fan out delays are considered. No wire delay or wire load models 
are used. This delay file is created using the information generated by "print_delay" command of SIS. 
The cell mapping file consists of node name and mapped library cell name. This file is created using 
the information generated by "print_gate" command of SIS. The BLIF file is created using 
"write_blif ' command of SIS. The BLIF output is then converted into Bookshelf format using the 
public domain tools available at the web site [59, 60, 41, 61] using the utility "blif2book-Linux.exe 
filename.blif filename". Using 0.13 micron standard cell technology files, Sankeerna generates a file 
in bookshelf format using the cell dimensions and pin locations of the given technology library. This 
file is used for placement by Sankeerna and also by Public Domain Placer (PDP) [59]. 
Bench marks are placed in case of PDP flow using "time LayoutGen-Lnx32.exe -f filename. aux -AR 
1.5 -saveas filename" [59]. The width to height ratio is 3:2 which is same for Sankeerna. Sankeerna 
gives the placement output in ".def ' format [62] (".def file"). The mapped netlist is given out in the 
form of structural verilog file (".v file"). Cadence® SOC Encounter® v06.10-p005_l [58] is used for 
routing and calculating the delay of the placement produced. The verilog (.v file) and placement (.def 
file) are read into SOC Encounter®. The 0.13 micron standard cell technology files, consisting of 
".lef files and timing library files ".tlf ' are read into SOC encounter. We did a trial route and then 
detailed routing using Cadence NanoRoute® v06.10-p006 [58]. 
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Figure 8 Test setup showing Sankeerna VLSI Design Flow 

The delays Worst Negative Slack (WNS) and Total Negative Slack (TNS) "from all inputs to all 
outputs" are noted. The CPU time for detailed routing is also noted. Various other characteristics of 
placed circuits namely standard cell area, core area, wire length and Worst Negative Slack (WNS) are 
noted. Results for various benchmarks are shown in Table 1, We compared the results produced by 
Sankeerna with a Public Domain Placer (PDP) [59]. The PDP flow is also shown in Figure 8. The 
PDP uses the benchmark circuits in bookshelf format. The BLIF file generated by SIS is converted 
into bookshelf format with cell dimensions and pin locations as per the 0.13 micron standard cell 
library. We have used same aspect ratio for Sankeerna and PDP. The aspect ratio for these 
experiments was 0.4 for height and 0.6 for width. To have a fair comparison, the Primary Inputs and 
Primary Outputs are placed at the boundary of the core for both placements produced by Sankeerna 
and PDP. The output from PDP [59] is in bookshelf format (.pi, .scl files). This is converted into 
".def format. The netlist is generated in the form of structural verilog file. All the utilities used to 
convert the above file formats are developed as software package of Sankeerna. The verilog, ".def, 
the technology file (.lef file) and timing files (.tlf files) are read into Cadence® SOC Encounter® 
[58]. The detailed routing and delay calculations are carried out. A Linux machine with dual Intel® 
Pentium® 4 CPU @ 3.00GHz and 2 GB memory was used for running Sankeerna and PDP. For 
running SOC Encounter® [58], Sun Microsystems SPARC Enterprise® M8000 Server with 960 
MHz CPU and 49 GB memory was used. The results are tabulated in Table 1. 
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VII. Results and Discussion 

The Table 1 shows the results of the placed circuits using existing Public Domain Placer (PDP) [59] 
and Sankeerna. 

Table 1 Test Results 
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There is an average improvement of 24% in delay (Worst Negative Slack WNS), 8% in area and 75% 
in wire length after detailed routing with Nano-route of Cadence [58] when compared to Public 
Domain Placer (PDP). Sankeerna uses only 3% of area extra over the standard cell area as shown 
under "S" in Table 1 where as PDP uses 12%. In case of bigger bench marks, namely, alu4, e64, 
mul5510 and add889, there are thousands of DRC violations in case of PDP. This is due to increased 
usage of wire length. So those placements are not useful. 

To compare Sankeerna with a commercial tool, we conducted the following experiment. We placed 
the benchmark circuit "alu4" using Sankeerna and did the detailed routing and delay calculation as 
mentioned earlier. Then the results given by Sankeerna are noted. We specified the same dimensions 
of width and height in the Commercial Tool Placer (CTP) after reading the verilog file into the tool. 
We then ran timing driven placement of commercial tool. We then carried out detailed routing. CPU 
time used for Sankeerna and Commercial Tool Placer (CTP) are given in Table 2. SOC Encounter® 
took 2:27 CPU time for detailed routing using Nanoroute® [58] for Sankeerna placement without any 
DRC violations. SOC Encounter® took 6:45 CPU time for detailed routing using Nanoroute for 
CTP's timing driven placement with 144 DRC violations in Metal layer 1. The layouts produced for 
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Sankeema placement and CTP's placement are shown in Figure 9. The black marks in CTP layouts 
in the Figure 9 are the DRC violations. These are shown separately in the same Figure (extreme right 
bl ock). Since there are so many DRC violations, the layout is not useful. 
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Figure 9 Layouts of ALU4 with Sankeema & CTP 



Table 2 shows results of other benchmarks comparing with Commercial Tool timing driven Placer 
(CTP). 

Table 2 Comparing Sankeerna with Commercial Tool Placer (CTP) 
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For the same floor plan dimensions, CTP took more CPU time and could not produce final usable 
placement due to several DRC violations as shown in the last column of Table 2. Sankeerna 
placements are always routed without DRC violations, because wire planning is already done using 
shortest wires. This avoids iterations between placement and detailed routing steps. Even though, 
WNS and TNS values shown in the Table are comparable, because of DRC violations, the layout 
produced by CTP is not useful; hence delay values have no meaning in case of CTP. So our method 
took much less time and produced better results when compared to Commercial Tool Placer (CTP). 
Pre-place delay is computed without taking wire delays into considerations. We have not used wire 
load models in SIS, because wire load models are not good estimates of final delay [2]. The post 
route delay includes delay due to wires and is the final delay of the circuit. We have compared pre- 
place delay with post route delay in Table 3. 
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The percentage difference of Worst Negative Slack of Sankeerna placements is much less (51.30% 
versus 122.09%) when compared to Public Domain Placer's (PDP's) values. This is due to the fact 
that PDP uses more wire when compared to Sankeerna. 

Cadence® SOC Encounter® [58] has the facility called "Trial route", which performs quick global 
and detailed routing creating actual wires, estimating routing related congestion and capacitance 
values. We did trial route of the benchmarks and noted wire lengths, WNS and TNS values for 
Sankeerna and PDP placements. The results are compared in Table 4. 

Table 4 Trial route Versus Detail (Nano) route [58], wire length and delay comparison 
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The percentage differences from trial route to detailed (Nano) route [58] are shown in the last 4 
columns of Table 4 for both Sankeerna and PDP. There is decrease in wire length by 0.82%, 1.19% 
increase in WNS in case of Sankeerna. The PDP placements took 10.39% more wire, WNS increased 
by 7.49%. So the Trial Route produced delays are good estimates in case of Sankeerna when 
compared to PDP placements. So we can get quick estimate of delay for Sankeerna produced 
placements. All these advantages add towards tighter coupling of VLSI design flow as envisaged in 
[4] to evolve Synergistic Design Flow. Sankeerna scales well as the complexity increases, because 
we are using constructive placement. The placement was found to converge after routing in all the 
test cases we have tried, that is, layout generated was without DRC violations after routing. To prove 
this point, we have taken bigger benchmarks out of Table 1 and improvements obtained are shown in 
Table 5. 

Table 5 As complexity increases, Sankeerna performs better over PDP. 
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As can be seen from the Table 5, Sankeerna took only 1.3% extra area over the standard cell area 
where as PDP took 11% extra area. The 1.3% extra area is left at the right edges of all rows. This is 
the bare minimum area required to make the placement possible. The area improvement of 
Sankeerna over PDP is 8.8%. The most interesting fact is that wire length is 114.4% more for PDP 
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when compared to Sankeerna. WNS improved by 46.2%. There are several DRC violations after 
detailed routing in case of PDP. Hence those layouts are not useful. Sankeerna used bare minimum 
area and produced better timings which were routable. Area, wire length and delay are interrelated. If 
we use minimum area, as is the case with Sankeerna, we require less length wire. This in turn leads to 
minimum delay. So using Sankeerna flow avoids design iterations, because the flow is tightly 
coupled from synthesis to layout. Thus Sankeerna produced compact layouts which are better in 
delay when compared to Public Domain Placer (PDP) and Commercial Tool's timing driven Placer 
(CTP). 

VIII. Conclusions and Future scope 

We now summarise the features of Sankeerna for fitting it into a Synergistic Design Flow (SDF) [4] 
as mentioned in Section 3. 

The first requirement was (a) linear time complexity with respect to number of cells in the circuit. 
The Sankeerna placement algorithms have linear time complexity after construction of the tree. In a 
tightly coupled Synergistic Design Flow (SDF) [4], trees are already built by synthesizer which can 
be directly used by the placer. So there is no need to construct them separately. Due to linear time 
complexity, Sankeerna scales well as the circuits become bigger. 

The second requirement was (b) awareness of synthesis and routing assumptions and expectations, 
that is, tight coupling of synthesis and routing as mentioned in [4]. We have used logic dependency 
and timing information from synthesizer. Using this information, it properly guides the placement as 
mentioned in [46, 28, 29, 30, 47]. As shown in Table 3, pre-place to post routed delay variation for 
Sankeerna was 51.30% when compared to 122.09% for Public Domain Placer (PDP) [59]. The values 
vary from 39.39% to 78.6% for Sankeerna. Where as the variation for PDP was, from 62.33% to 
242.99% based on the circuit. So Sankeerna is more coupled to synthesizer's estimates when 
compared to PDP. Sankeerna placements were always routed without DRC violations as shown in 
Table 1 and 2. Where as PDP has thousands of violations for bigger circuits even after using 11% 
extra space when compared to Sankeerna which used only 0% to 1% extra over the area calculated by 
the synthesizer, which is bare minimum over the standard cell area. For the same floor plan 
dimensions, Commercial Tool's timing driven Placer (CTP) produced hundreds of DRC violations as 
shown in Table 2 when compared to zero DRC violations for Sankeerna. In Sankeerna routability is 
achieved without white space allocation, because placements produced by Sankeerna use minimum 
length wires. As mentioned in Section 2, white space allocation increases area which in turn increase 
wire length. In conclusion, the placements produced by Sankeerna were always routable because it 
uses minimum wire when compared to PDP and CTP. 

The third requirement was (c) achieving minimum area and delay. Area increase was only 1.3% over 
the standard cell area which is calculated by the synthesizer. This value for PDP was 11%. As shown 
in Table 5, Sankeerna performed better as the complexity increased when compared to Public 
Domain Placer (PDP) [59]. Wire length improved by 114.4% and delay by 46.2% when compared to 
PDP. 

The fourth requirement was (d) placement produced should be routable without Design Rule Check 
(DRC) violations, that is, wire planning has to be done during placement. As shown in Table 5, PDP 
could not produce usable placements due congestion and resulted in thousands of DRC violations in 
four cases out of 8 test cases. So design flow did not converge in case of PDP. It is unpredictable, 
because, in some cases it converged. In case of Sankeerna, it always converged and convergence is 
predictable due to minimum wire length and proper wire planning done during placement. As shown 
in Table 2, same non-convergence and unpredictability was noticed in case of Commercial Tool's 
Placer (CTP). 

The fifth requirement was (e) delay of final layout should be predictable with trial routes. As shown 
in Table 4, for Sankeerna, wire length decreased by 0.82% from trial route to detailed route, where as 
it increased by 10.39 % for PDP. The delay increase was 1.19% for Sankeerna, where as it is 7.49% 
for PDP. Thus, the wire planning done by Sankeerna was maintained after routing, whereas it varied 
in the case of PDP. Trial route [58] was good estimate for Sankeerna. So there is convergence and 
tight coupling between placement and routing in case of Sankeerna. 
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The sixth requirement was (f) placer should smoothly interface with synthesis and routing tools. As 
shown in Figure 8, the test set up showing Sankeerna design flow, Sankeerna smoothly interfaces 
with the existing synthesizer and router. We have not used any wire load models during synthesis as 
it was demonstrated that they were not useful [2]. The slack at each node and cell mapping 
information were already available with the synthesizer. The router was interfaced through verilog 
netlist and "def ' [62] file for placement information. Pre-place and trial route delay calculations were 
done by the commercial tool [58], which were good estimators in case of Sankeerna for a real 
standard cell technology library. As can be seen from the experiments, there were no design iterations 
among synthesis, placement and routing in case of Sankeerna to achieve the results shown. 
Area, wire length and delay calculations of Pre-place, trial and post route were done by the 
Commercial tool [58]. This validates that there is no error in measuring these values while 
conducting these experiments. 

The features and effectiveness comparison of Sankeerna with other published placement techniques 
is elaborated here. In Sankeerna the cells which are logically dependent are placed closer to each 
other [29], whereas in other placement algorithms the cells are randomly scattered and create zigzags 
and criss-crosses that leads to increase in congestion, wire length and delays. The random scattering 
of the cells even leads to the unpredictability in the final layout that result into non-convergent 
iterations. Because wires are shorter in our placement and wires are planned by selecting closest 
locations during placement, congestion is less and detailed routing always gets completed using 
minimum area for wires. This automatically leads to minimum delays. The most critical paths 
automatically get higher priority, without going in for path based placement, which grows 
exponentially with circuit complexity and is computationally expensive. As it can be seen from 
Figure 5, Sankeerna first establishes the most critical path and rest of the logic is placed around it 
based on the logic dependency. This is similar to the denser path placement of [30]. So the most 
critical path is placed, along physically and delay wise shortest path as mentioned in Section 3. Since 
our method is constructive, it scales well for bigger circuits. We are planning to test with bigger test 
cases in future. The Circuit is naturally partitioned when trees are built, rooted by Primary Outputs 
(POs) by Sankeerna. So there is no additional burden of extracting cones as in [27, 29] or 
partitioning the circuit as is the case in most of the placers. Global signal flow is kept in mind all 
through the placement by using Depth First Search (DFS) for all sub trees rooted at various levels of 
logic, unlike other placement methods, which randomly scatter the cells. Trial route [58] can be used 
for quick estimate of delay which will be good estimate in case of Sankeerna as explained earlier. As 
mentioned in [2], using wire load models misleads whole design process resulting in non- 
convergence. So Sankeerna flow does not use wire load models. Sankeerna flow is always 
convergent and tightly coupled, which gives estimates of area, wire length and delay using existing 
layout tools like Trial route of Cadence's SOC Encounter® [58], which are not far away from the 
values obtained after detailed routing. Thus Sankeerna approach is useful towards evolving 
Synergistic Design Flow (SDF), which is to create iteration loops that are tightly coupled at the 
various levels of design flow as mentioned in [4]. 
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